NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SelfCodeAlign: Self-Alignment for Code Generation

Wei, Yuxiang; Cassano, Federico; Liu, Jiawei; Ding, Yifeng; Jain, Naman; Mueller, Zachary; de_Vries, Harm; von_Werra, Leandro; Guha, Arjun; Zhang, Lingming (December 2024, NeurIPS 2024)

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. For programming tasks, most models are finetuned with costly human-annotated instruction-response pairs or those generated by large, proprietary LLMs, which may not be permitted. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component’s effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance. Overall, SelfCodeAlign shows for the first time that a strong instruction-tuned code LLM can result from self-alignment rather than distillation.
more » « less
Full Text Available
SelfCodeAlign: Self-Alignment for Code Generation

Wei, Yuxiang; Cassano, Federico; Liu, Jiawei; Ding, Yifeng; Jain, Naman; Mueller, Zachary; Vries, Harm de; Werra, Leandro Von; Guha, Arjun; Zhang, Lingming (December 2024, NeurIPS 2024 (Curran Associates))

Full Text Available
Evaluating Language Models for Efficient Code Generation

Liu, Jiawei; Xie, Songrun; Wang, Junhao; Wei, Yuxiang; Ding, Yifeng; Zhang, Lingming (August 2024, OpenReview)

Full Text Available
Magicoder: Empowering Code Generation with OSS-Instruct

Wei, Yuxiang; Wang, Zhe; Liu, Jiawei; Ding, Yifeng; Zhang, Lingming (July 2024, ACM)

Full Text Available
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

https://doi.org/10.18653/v1/2024.acl-long.699

Ding, Yifeng; Liu, Jiawei; Wei, Yuxiang; Zhang, Lingming (January 2024, Association for Computational Linguistics)

Full Text Available
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

https://doi.org/10.1145/3611643.3616271

Wei, Yuxiang; Xia, Chunqiu Steven; Zhang, Lingming (November 2023, ACM)

Full Text Available
Automated Program Repair in the Era of Large Pre-trained Language Models

Xia, Chunqiu Steven; Wei, Yuxiang; Zhang, Lingming (July 2023, Proceedings of the IEEE/ACM International Conference on Software Engineering)

Full Text Available
Coverage-guided tensor compiler fuzzing with joint IR-pass mutation

https://doi.org/10.1145/3527317

Liu, Jiawei; Wei, Yuxiang; Yang, Sen; Deng, Yinlin; Zhang, Lingming (April 2022, Proceedings of the ACM on Programming Languages)

In the past decade, Deep Learning (DL) systems have been widely deployed in various application domains to facilitate our daily life, e.g., natural language processing, healthcare, activity recognition, and autonomous driving. Meanwhile, it is extremely challenging to ensure the correctness of DL systems (e.g., due to their intrinsic nondeterminism), and bugs in DL systems can cause serious consequences and may even threaten human lives. In the literature, researchers have explored various techniques to test, analyze, and verify DL models, since their quality directly affects the corresponding system behaviors. Recently, researchers have also proposed novel techniques for testing the underlying operator-level DL libraries (such as TensorFlow and PyTorch), which provide general binary implementations for each high-level DL operator and are the foundation for running DL models on different hardware platforms. However, there is still limited work targeting the reliability of the emerging tensor compilers (also known as DL compilers), which aim to automatically compile high-level tensor computation graphs directly into high-performance binaries for better efficiency, portability, and scalability than traditional operator-level libraries. Therefore, in this paper, we target the important problem of tensor compiler testing, and have proposed Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Tzer focuses on mutating the low-level Intermediate Representation (IR) for TVM due to the limited mutation space for the high-level IR. More specifically, Tzer leverages both general-purpose and tensor-compiler-specific mutators guided by coverage feedback for diverse and evolutionary IR mutation; furthermore, since tensor compilers provide various passes (i.e., transformations) for IR optimization, Tzer also performs pass mutation in tandem with IR mutation for more effective fuzzing. Our experimental results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing, with 75% higher coverage and 50% more valuable tests than the 2nd-best technique. Also, different components of Tzer have been validated via ablation study. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed (PR merged).
more » « less
Full Text Available

Search for: All records